library(tidyverse)
library(ggplot2)
library(plotly)
library(here)#setwd(here('GitHub', 'CUNY_MSDA', 'Fall_2017', 'DATA_606', 'Project Proposal'))
#
# Load the files from the working directory
#
amendments_raw <- read.csv("amendment_list.csv")
members_raw <- read.csv("congress_terms.csv")
#
# Remove empty column, and remove all rows with missing data
#
bills <- read.csv("bills93-114.csv", header = T, na.strings = c('', 'NULL')) %>%
select(-7)
bills <- bills[complete.cases(bills), ]Is there a relationship between the average age of congress (members) and the number of constitutional amendments proposed?
The average age of congressional representatives has been steadily climbing since the second world war. The current (115th) one is among the oldest in its history. How has this affected the effectiveness of congress? Are older more representatives more or less active?
I plan to explore this via proxy, by taking a look at all the constitutional amendments proposed since the first congress through the 113th, and recording the age of each of the bill's sponsors. Additionally, I will seek any interesting tidbits in the data, such as the most active years, as well as which state representatives propose the most legislation.
The amendment list was retrieved from Kaggle, while the members list was taken from FiveThirtyEight. Another source is from the Wall Street Journal. For the analysis, I used a different dataset, retrieved from CongressionalBills.org.
The list of 11,000+ amendments was compiled by staff and volunteers of the National Archives and Records Administration. The list of representatives was compiled by The UnitedStates Project (House members), and The New York Times Congress API (senate).
Each case represents a constitional amendment proposed by congress. There are a total of 11797 cases in this dataset.
The response variable is legislative activity and is numerical.
The explanatory variable is median age of congressional representatives and is numerical.
This is an observational study.
This is a large enough sample of bills passed that we can generalize the results to the overall 'population'.
The data cannot be used to establish causal links, since it's only an observational study.
#
# Tidy the datasets
#
# Keep only the relevant columns
amendments <- amendments_raw %>%
select(5, 7:ncol(amendments_raw)-1, -6)
# Use regex to shift errant data to their appropriate columns
for (i in 1:(length(amendments$year))) {
pat <- "\\D{3,}"
if (grepl(pat, amendments[i, "month"]) == 1)
{
amendments[i, "year"] <- amendments[i, "month"]
amendments[i, "month"] <- amendments[i, "day"]
amendments[i, "day"] <- amendments[i, "congress"]
amendments[i, "congress"] <- amendments[i, "congressional_session"]
amendments[i, "congressional_session"] <- amendments[i, "joint_resolution_chamber"]
}
}
amendments$year <- gsub("\\D{4}$", "", amendments$year)
members <- members_raw %>%
select(-c(3,7))Let's take a look at what the data has to say.
#
# Which years had the most bills?
#
by_year_graph <- ggplot(amendments, aes(year)) +
geom_bar() +
scale_x_discrete(breaks=seq(1788, 2014, 20))
by_year_graph <- ggplotly(by_year_graph)
by_year_graphThere is a noticeable spike in the 60s through 80s; my guess would be it's related to the civil rights movement. We can see the most common titles/descriptions of all bills:
head(summary(amendments$title_or_description_from_source))## Equal rights for men and women Equal rights regardless of sex
## 601 399
## Balancing the budget Right to vote
## 288 246
## Prayer in public schools Apportionment of State legislatures
## 241 208
Three of the top four most common amendments are indeed related to civil rights.
amendments <- amendments %>%
filter(!sponsor_state_or_territory %in% "")
by_state_graph <- ggplot(amendments, aes(sponsor_state_or_territory)) +
geom_bar() +
theme(axis.text.x = element_text(angle = 90))
#scale_x_discrete(breaks=seq(1788, 2014, 20))
by_state_graph <- ggplotly(by_state_graph)
by_state_graphNew York congressmen have proposed the most amendments, followed by those from Texas and California.
#
# Create new df of unique congressmen - remove duplicates
#
members_unique <- members %>%
distinct(lastname, birthday, .keep_all = T)
table(members_unique$party)##
## AL D I ID L R
## 2 1662 11 2 1 1541
table(members_unique$incumbent)##
## No Yes
## 2782 437
summary(members$age)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 25.00 45.40 53.00 53.31 60.55 98.10
The two major parties dominate, and are roughly even in number.
Since we removed duplicates, it's reporting most congressmen were not incumbents; if we were to include the duplicates, surely the incumbents column would be several times larger.
The average senator is 53 years old at the time of their inauguration. I'm not sure what's more surprising - that there was a congress member aged 25, or that there was a congress member aged 98!
age_year_graph <- ggplot(members, aes(congress, age)) +
geom_point() +
stat_summary(aes(y = age, group = 1), fun.y = mean, colour = "red", geom = "line", group = 1)
age_year_graph <- ggplotly(age_year_graph)
age_year_graphCongress is definitely getting older. The average age of congress members in the 80th congress was \(\approx 52.5\) years old. In the 113th congress, the average age was \(\approx 57.6\) years old!
\(H_0\): Each bill has an equal chance to pass, regardless of the sponsor's party, age, district, terms served, etc.
\(H_1\): Each bill does not have an equal chance, i.e. there are variables that can alter the probability of passing.
Independence: We can assume that the bills are independent of each other. Sample size: Each sample has at least 5 cases.
formula <- 'PLaw ~ Age + ComC + CumHServ + District + Gender + Majority + Party + State'
model <- glm(formula= formula, data=bills, family=binomial(link="logit"))
summary(model)##
## Call:
## glm(formula = formula, family = binomial(link = "logit"), data = bills)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.0223 -0.3716 -0.3069 -0.2514 3.0325
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -4.141365 0.245519 -16.868 < 2e-16 ***
## Age 0.017993 0.001471 12.235 < 2e-16 ***
## ComC 0.515103 0.037889 13.595 < 2e-16 ***
## CumHServ 0.018060 0.001880 9.606 < 2e-16 ***
## District -0.002248 0.001098 -2.047 0.040650 *
## Gender -0.222755 0.069479 -3.206 0.001346 **
## Majority 0.692814 0.035549 19.489 < 2e-16 ***
## PartyR 0.672183 0.035726 18.815 < 2e-16 ***
## StateAL -0.311598 0.244820 -1.273 0.203103
## StateAR 0.285120 0.241857 1.179 0.238446
## StateAZ -0.177144 0.262651 -0.674 0.500028
## StateCA -0.461073 0.227616 -2.026 0.042799 *
## StateCO -0.063225 0.248109 -0.255 0.798856
## StateCT -0.817284 0.249150 -3.280 0.001037 **
## StateDE -0.303827 0.323655 -0.939 0.347866
## StateFL -0.473599 0.237894 -1.991 0.046503 *
## StateGA 0.068132 0.240360 0.283 0.776825
## StateHI -1.130915 0.304718 -3.711 0.000206 ***
## StateIA -0.835370 0.256716 -3.254 0.001138 **
## StateID -0.260772 0.278561 -0.936 0.349201
## StateIL -0.966694 0.234794 -4.117 3.84e-05 ***
## StateIN -0.815532 0.251223 -3.246 0.001169 **
## StateKS -0.828676 0.258334 -3.208 0.001338 **
## StateKY -0.400810 0.246119 -1.629 0.103415
## StateLA -0.197388 0.241399 -0.818 0.413539
## StateMA -0.500389 0.234997 -2.129 0.033226 *
## StateMD -0.279738 0.238939 -1.171 0.241700
## StateME -0.490469 0.275579 -1.780 0.075113 .
## StateMI -0.695460 0.236053 -2.946 0.003217 **
## StateMN -0.703081 0.254018 -2.768 0.005643 **
## StateMO -0.356222 0.243423 -1.463 0.143361
## StateMS -0.263297 0.247054 -1.066 0.286538
## StateMT 0.088535 0.248909 0.356 0.722070
## StateNC 0.274487 0.239143 1.148 0.251053
## StateND -1.051264 0.290165 -3.623 0.000291 ***
## StateNE -0.542104 0.269634 -2.011 0.044376 *
## StateNH -1.887889 0.446190 -4.231 2.33e-05 ***
## StateNJ -0.894611 0.236750 -3.779 0.000158 ***
## StateNM 0.031731 0.251805 0.126 0.899721
## StateNV -0.857734 0.322405 -2.660 0.007804 **
## StateNY -0.988787 0.226496 -4.366 1.27e-05 ***
## StateOH -0.527559 0.234389 -2.251 0.024399 *
## StateOK 0.235634 0.247035 0.954 0.340161
## StateOR -0.496259 0.256493 -1.935 0.053016 .
## StatePA -0.770831 0.230655 -3.342 0.000832 ***
## StateRI -0.657776 0.273822 -2.402 0.016297 *
## StateSC 0.136647 0.240183 0.569 0.569405
## StateSD -0.346970 0.264489 -1.312 0.189570
## StateTN -0.173729 0.242436 -0.717 0.473622
## StateTX 0.071055 0.232054 0.306 0.759454
## StateUT -0.428544 0.287851 -1.489 0.136549
## StateVA 0.062025 0.239214 0.259 0.795413
## StateVT -0.154349 0.366113 -0.422 0.673325
## StateWA -0.423932 0.243018 -1.744 0.081081 .
## StateWI -0.809815 0.246743 -3.282 0.001031 **
## StateWV -0.709740 0.253026 -2.805 0.005032 **
## StateWY 0.180191 0.261022 0.690 0.489988
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 78617 on 173967 degrees of freedom
## Residual deviance: 74644 on 173911 degrees of freedom
## AIC: 74758
##
## Number of Fisher Scoring iterations: 6
anova(model, test="Chisq")## Analysis of Deviance Table
##
## Model: binomial, link: logit
##
## Response: PLaw
##
## Terms added sequentially (first to last)
##
##
## Df Deviance Resid. Df Resid. Dev Pr(>Chi)
## NULL 173967 78617
## Age 1 1291.38 173966 77326 < 2.2e-16 ***
## ComC 1 718.35 173965 76607 < 2.2e-16 ***
## CumHServ 1 183.60 173964 76424 < 2.2e-16 ***
## District 1 74.92 173963 76349 < 2.2e-16 ***
## Gender 1 22.46 173962 76326 2.144e-06 ***
## Majority 1 115.27 173961 76211 < 2.2e-16 ***
## Party 1 212.46 173960 75999 < 2.2e-16 ***
## State 49 1355.08 173911 74644 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
age_chart <- ggplot(bills, aes(x = Age)) +
geom_bar()
ggplotly(age_chart)Our initial question, whether age plays a role in legislation passed, seems to vary based on the type of legislation. Constitutional amendments tend to be proposed by older representatives, while overall, the introduction and sponsorship of laws leans toward younger ones.
There were some obvious findings in the data: average age of congress members is climbing; the two major parties dominate in congress; most representatives are incumbents, i.e. most server multiple terms; and representatives from the more populous states tend to write more laws.
But there were also some more surprising ones: most of the amendments were proposed in the civil rights era; representatives from states that introduce less legislation often get theirs signed into law; there are variables that can affect a bill's chances of getting signed into law.
Unfortunately, it's difficult to find proper data on this subject. There are projects, both governmental and NGO-run, that are working toward 'opening' a lot of it, but it's still in the nascent stages, and, thus far, only have information on the most recent congresses. There are a lot of gaps, and not many overlaps, in the data, which practically makes it impossible to do any predictive analysis. When these projects mature, and more data is available, that would be an interesting project to undertake.